CiberAMP requires several packages that will be automatically installed:
r Biocpkg("TCGAbiolinks")
, for TCGA data downloading, filtering, normalization and differential expression analysis (DEA).r Biocpkg("SummarizedExperiment")
, to manage RNAseq experiments data.r Biocpkg("EDASeq")
, to normalize RNAseq counts.r Biocpkg("edgeR")
, to perform differential expression analysis (DEA) between two groups of samples.r Biocpkg("RTCGAToolbox")
, to download latest GISTIC2.0 tun thresholded-by-file data from Broad's Firehouse data server.r Biocpkg("dplyr")
& r Biocpkg("stringr")
, to manage strings in R.r Biocpkg("ggplot2")
& r Biocpkg("plotly")
& r Biocpkg("shiny")
, to plot static (ggplot2) and interactive (plotly) graphs with the results.The algorithm requires two mandatory inputs. The first one is a list of gene symbols to analyze. This can range from a single transcript to the whole set of human genes.
genes.of.interest <- c("GENEA", "GENEB", "GENEC", ...)
In this example, we will study which are genes are differentially expressed in association with their SCNVs in the cohort of head-and-neck and lung squamous cell carcinomas (TCGA-HNSC and TCGA-LUSC, respectively). In particular, we will focus our analysis on genes lodge at chromosomes 3, 5 and 8.
The first step is to create a vector with the symbols of the genes that are located at chromosomes 3, 5 and 8:
library(biomaRt) library(stringr)
# First, we get all the genes at chromosomes 3, 5 and 8 using the biomaRt R package. If you have the last version of R, the you can set "TRUE" the "useCache" argument in getBM function. ensembl <- useEnsembl(biomart = "ensembl", dataset = "hsapiens_gene_ensembl", mirror = "useast") genes <- getBM(attributes=c('chromosome_name', 'band', 'start_position', 'end_position', 'strand', 'hgnc_symbol'), filters = "chromosome_name", values = c("3", "5", "8"), mart=ensembl, useCache = FALSE) genes <- genes[genes$hgnc_symbol != "", ] genes <- genes[!duplicated(genes$hgnc_symbol), ] for(i in 1:nrow(genes)) { if(genes$strand[i] < 0) { genes$strand[i] <- "-" }else{ genes$strand[i] <- "+" } }
Now, we have (1) the list of gene symbols to query and (2) the TCGA cohorts to analyze (TCGA-HNSC and TCGA-LUSC). CiberAMP also allows you to designate a path to save your results. By default, it will use the current working directory.
library(ciberAMP) # x <- ciberAMP(genes = reg.genes$hgnc_symbol, cohorts = c("LUSC", "HNSC"), pat.percentage = 0, *writePath = "PATH_TO_FOLDER"*)
However, there are many arguments that allow you to personalize your analysis:
TCGAanalyze_Preprocessing
.gcContent
or geneLength
(default).quantile
(default), varFilter
, filter1
, filter2
.IQR
. See genefilter
documentation for available methods.filt.var.funct
.filter1
. Defaults to 0.05.Deep
, Shallow
or Both
NULL
.NULL
.In this example, we will run the analysis with default parameters:
x <- ciberAMP(genes = as.character(genes$hgnc_symbol), cohorts = c("HNSC", "LUSC"))
CiberAMP results into a list of 3 data frames that can be accessed by:
The x[[1]] data frame contains all differentially expressed genes between (1) tumor vs normal and (2) copy number altered vs diploid tumor samples:
#To get the first queried gene (first row) results. head(x[[1]][1, ])
The x[[2]] data frame containing CGC list differential expression results. Columns are the same as x[[1]].
head(x[[2]][1, ])
Finally, the x[[3]] data frame contains SCNVs co-occurrency between the SCNV-DEGs and known cancer driver genes amplification and deletions:
head(x[[3]][1, ])
From the original 1870 genes we found that:
They are mapped at their chromosomal positions:
library(chromPlot) df <- x[[1]] genes_cosmic <- x[[3]] gr.assoc.hnsc <- gr[gr$hgnc_symbol %in% as.character(df[df$Tumor %in% "HNSC" & abs(df$log2FC.SCNAvsDip) >= 1 & df$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ] gr.assoc.lusc <- gr[gr$hgnc_symbol %in% as.character(df[df$Tumor %in% "LUSC" & abs(df$log2FC.SCNAvsDip) >= 1 & df$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ] both <- intersect(gr.assoc.hnsc$hgnc_symbol, gr.assoc.lusc$hgnc_symbol) chromPlot(gaps=hg_gap, annot1 = gr[gr$hgnc_symbol %in% both, ], chr = c(3,5,8))
To answer if these genes are very close to any SCN-altered CGC in the cohort, we visually plot them (red bars) and the CGC co-occurring ones (yellow bars):
library(chromPlot) df.cosmic <- x[[2]] gr.assoc.hnsc.cosmic <- gr[gr$hgnc_symbol %in% as.character(df.cosmic[df.cosmic$Tumor %in% "HNSC" & abs(df.cosmic$log2FC.SCNAvsDip) >= 1 & df.cosmic$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ] gr.assoc.lusc.cosmic <- gr[gr$hgnc_symbol %in% as.character(df.cosmic[df.cosmic$Tumor %in% "LUSC" & abs(df.cosmic$log2FC.SCNAvsDip) >= 1 & df.cosmic$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ] both.cosmic <- intersect(gr.assoc.hnsc.cosmic$hgnc_symbol, gr.assoc.lusc.cosmic$hgnc_symbol) chromPlot(gaps=hg_gap, annot1 = gr[gr$hgnc_symbol %in% both, ], annot2 = gr[gr$hgnc_symbol %in% both.cosmic, ], chr = c(3,5,8), chrSide=c(-1,1,1,1,1,1,1,1))
CiberAMP package provides a ggplot-based function to create a scatter plot where:
In the end, a significantly altered gene can be classified in 8 scenarios:
Let's take a look on those 67 resulting genes:
library(ggplot2) df.exp <- x[[1]] df.exp <- df.exp[df.exp$Gene_Symbol %in% both, ] ggplot.CiberAMP(df.exp)
Finally, CiberAMP provides a function to interactively explore your results. In this app, you can filter genes by:
Clicking on any dot will display a table with the overlapping % of samples in which a SCN-deregulated queried genes co-occurs with any CGC oncodrivers' SCNAs.
Tip: Click on the 'submit' button to initialize the graph!! If 'ALL' is written in the box, then every SCN-altered queried gene from your results will be plotted.
library(shiny) library(plotly) library(DT) int.plot.CiberAMP(df.exp, x[[3]])
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.